An improved adaptive neuro fuzzy inference system model using conjoined metaheuristic algorithms for electrical conductivity prediction

Precise prediction of water quality parameters plays a significant role in making an early alert of water pollution and making better decisions for the management of water resources. As one of the influential indicative parameters, electrical conductivity (EC) has a crucial role in calculating the proportion of mineralization. In this study, the integration of an adaptive hybrid of differential evolution and particle swarm optimization (A-DEPSO) with adaptive neuro fuzzy inference system (ANFIS) model is adopted for EC prediction. The A-DEPSO method uses unique mutation and crossover processes to correspondingly boost global and local search mechanisms. It also uses a refreshing operator to prevent the solution from being caught inside the local optimal solutions. This study uses A-DEPSO optimizer for ANFIS training phase to eliminate defects and predict accurately the EC water quality parameter every month at the Maroon River in the southwest of Iran. Accordingly, the recorded dataset originated from the Tange-Takab station from 1980 to 2016 was operated to develop the ANFIS-A-DEPSO model. Besides, the wavelet analysis was jointed to the proposed algorithm in which the original time series of EC was disintegrated into the sub-time series through two mother wavelets to boost the prediction certainty. In the following, the comparison between statistical metrics of the standalone ANFIS, least-square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), generalized regression neural network (GRNN), wavelet-LSSVM (WLSSVM), wavelet-MARS (W-MARS), wavelet-ANFIS (W-ANFIS) and wavelet-GRNN (W-GRNN) models was implemented. As a result, it was apparent that not only was the W-ANFIS-A-DEPSO model able to rise remarkably the EC prediction certainty, but W-ANFIS-A-DEPSO (R = 0.988, RMSE = 53.841, and PI = 0.485) also had the edge over other models with Dmey mother in terms of EC prediction. Moreover, the W-ANFIS-A-DEPSO can improve the RMSE compared to the standalone ANFIS-DEPSO model, accounting for 80%. Hence, this model can create a closer approximation of EC value through W-ANFIS-A-DEPSO model, which is likely to act as a promising procedure to simulate the prediction of EC data.


Main objectives and contributions.
Over the past decade, researchers, particularly hydrologists, have witnessed a remarkable increment trend around the world to discover influential computational models for surface WQ simulation 53,54 . It has been concomitant with noteworthy advancements in modeling. Needless to say, understanding the surface WQ perfectly, as a natural problem, is a challenging issue. In turn, the key goal of devising a novel hybridized version of ML models is to assess this ordeal more effectively. The necessity of the internal model parameters tuning, data clustering and cleaning, data preprocessing and several others gave rise to some limits in Standalone ML model. Hence, the current study is motivated to develop an efficient hybrid model coupled with wavelet theorem, called wavelet adaptive neural fuzzy inference system couple with an adaptive hybrid of differential evolution and particle swarm optimization (W-ANFIS-A-DEPSO). In fact, the main contribution of this study is to hybridize the ANFIS model with an efficient optimization method called A-DEPSO, which is a novel hybrid model u. The A-DEPSO algorithm uses a powerful local and global search mechanism to avoid local solutions and moves toward the global solution. In addition, it uses an adaptive control parameter to assist the algorithm in balancing the exploration and exploitation. The model is designed to predict EC index on a monthly basis at Maroon River, Iran. Accordingly, a set of data including monthly discharge (Q) and EC measured over three decades, from 1980 to 2016, at Tange-Takab station is used for modeling.
A set of preprocessing analyzes, being essential in the proposed model training process, is intended to select the appropriate input parameters to the predictive models. In order to address the most excellent selective combinations separately for EC, the best subset regression approach is planned. By considering a wide range of statistical metrics, graphical implements, and error analysis in selective combinations, the predictive abilities for the standalone and wavelet-based ML models are evaluated.

Methodology
Adaptive neuro fuzzy inference system (ANFIS). ANFIS is a hybrid method merging the ANN and fuzzy logic, which is first initiated by Ref. 55 . ANFIS uses the IF-THEN fuzzy rules (FRs) for describing the knowledge between the input and target dataset of a modeling problem 56,57 . The main structure of the ANFIS model is displayed in Fig. 1. In this model, the Takagi-Sugeno inference procedure plays an integral role in generating the if-then rules, the range from input to output.
in which α 1 , β 1 , γ 1 , and α 2 , β 2 , γ 2 denote the consequence parameters, and D 1 , D 2 , C 2 , and C 2 are considered membership functions (MFs). The ANFIS includes five layers, with it consisting of a wide range of inputs and just one output. This structure is elaborated as follows: During the first layer, each node is controlled by a specific function parameter, which is then used to generate an amount of membership degree ( ψ ) by the bell MF.
in which a k , e k , and m k are defined as membership values. At the second layer, each node output is regarded an input signal, indicating the firing strength of each rule.
Eventually, layer 5 computes the network output.
Least square support vector machine (LSSVM). Suykens and Vandewalle introduced a new method of the support vector machine (SVM) called LSSVM, which uses linear equations to raise the convergence speed [58][59][60] . However, the SVM works with a quadratic programming technique for training 61 . To put it simply, in terms of simple structure and high convergence speed, the LSSVM has the edge over SVM, which results in more popular methods in regression and classification fields 61,62 . In the following, the model was formulated in which the training dataset is defined through ( x n , y n ), n = 1, 2, …, N, as x n and y n are considered the input and output dataset.
where is considered a penalty parameter, ζ is defined as a regression error, θ (x m ) is a non-linear function,ψ T is the transposed output layer vector, b is a parameter for calculation. In addition, the Eqs. (9) and (10) may be expressed by the Lagrange procedure. So, In which β n is defined as a lagrange multiplier. Through Karush-Kuhn-Tucker (KKT) conditions, some solutions are gained, and in the following, they are formulated: Subject to : (10) y n = ψ T θ (x n ) + b + ζ n ,n = 1,2, . . . ,N,  in which δ being a fixed parameter.

Generalization regression neural network (GRNN).
In 1991, a probabilistic-based neural network based on radial basis function (RBF), known as Generalized regression neural network (GRNN), was introduced by Specht 63 . Nowadays, this model is usually used for classification and regression while dealing with non-linear fitting systems in large-scale samples, and the model operates according to the nonparametric kernel regression network. Overall, GRNN may lead to a fewer local minimum by a learning algorithm, whereas it has fewer adaptation parameters by comparison with the backpropagation and RBF artificial neural network 64 . In other words, this model accounts for four layers, input layer, radial layer, regression layer, and an output layer, with the structure consisting of a radial neurons layer and a regression layer which are located in the input and output layers 65 .
To add to it, pattern (radial neurons) layer in which there is the input data in training step, the neurons number is identical with the data sample points. Besides, the summation layer has provided by a different neuron rather than the output layer being considered to estimate the density function. However, other neurons are supplied with the purpose of output estimation. To sum up, the GRNN model spend less time operation in comparison with other ANNs, since this method has directly selection operation between predictors and target 65 . This model uses a control parameter called spread parameter, exhibits the spread of RBF and regulates the function to obtain the most relevant fitness.

Multivariate adaptive regression spline (MARS).
The MARS method is an innovative side of stepwise linear regression (SLR) and is used to solve modeling problems having high input parameters, and it was presented by Friedman 66 . Regarding the MARS operation, this method works differently, as it brings about diverse slopes (i.e. linear bias functions (LBFs)) for different domains of variable range, whilst the SLR utilizes one slop for input variable 67 . Therefore, it can be concluded that the MARS can be concomitant with more data than the SLR to elaborate on how an essential variable impacts the dependent variable. Consequently, the MARS is made based on the LBFs structure and stems from the SLR classification 68 . This means it is unlikely to need previous knowledge to determine LBF numbers and parameters. In turn, it can be expressed that the MARS method has the edge over SLR thanks to the mentioned merit. Also, it is noteworthy that through a set of elementary LBFs the connection between input and output data appears, and in the following a LBF is formulated, in which B is a beginning variable in order to divide the x range into sub-ranges, D n is a basic function, x defines the input dataset. The fundamental MARS formulation is expressed as, where G(x) demonstrates the output, N is the total number of weighting factors, and α 0 , α 1 , . . . , α N are weighting factors in the MARS method. www.nature.com/scientificreports/ There are two fundamental steps in the MARS application process: forward and backward stages. The forward is a step in which it is tried to reduce the probable errors in the training phase by increasing LBFs in the model. Eventually, this step is completed by the provided total number of LBFs. On the other hand, the second stage is to decrease the overfitting trend, with it eradicting the extra LFBs. Estimating sub-models are necessary, being done using the generalized cross-validation (GCV) index 69,70 so, in which MSE is the mean square error, m is LBF number, n means observations training dataset number, pen defines a penalty factor, recommended by Friedman and Jekabsons, and it is in the range of Refs. 69,70 . Wavelet theory. In order to reach an appropriate analysis of non-stationary signals, wavelet transform (WT) is operated as a novel and efficient method. It was owing to the fact that this method is more flexible than Fourier transform, with it bringing about flexibility between the time scale and frequency 71 . Likewise, the new method has the advantage of analyzing signals, albeit at diverse degrees of the time scale. To put it simply, a wavelet works as a time function with fluctuations, and its energy is restricted to a fixed span of time. Provided that ϕ is considered to detect the mother wavelet, the continuous wavelet transform (CWT) is defined by the equation mentioned below 72,73 : where b factor is a scale and depicts the stretch or duration of the wavelet. c factor is a transfer parameter supplying the required time concentration and defining the point of wavelet on the time pivot. In addition, the discrete type of wavelet transform (DWT) is highly likely to be utilized in order for analyzing the time series due to the fact that time discrete series are conventional in hydro-climatological works. At any spot in the signal (b) and for any scale value (c), the coefficients of wavelet are measurable by the equation mentioned below: Regarding DWT, these transform and scale factors are disconnected as, In fact, k and l are integers. By changing b and c in relation, the following equations can be obtained (25): Therefore, the wavelet function is discrete wavelet. The DWT can be: Proposed ANFIS-ADEPSO. ANFIS model uses a classical optimization method to minimize the difference between the target and estimated outputs. The optimization method combines least squares solver (LSS) and gradient descent (GD) methods. The optimal MFs of input parameters and coefficients of the linear relation of FRs is determined by the hybrid optimization method during the training stage. One of the most severe critiques concerning the classical optimization methods is getting stuck in local solutions 36 , where employing metaheuristic optimization methods such as A-DEPSO can be a helpful choice 74 . The flowchart of the A-DEPSO algorithm coupled with the ANFIS model is displayed in Fig. 2 and expressed in the following section.

Adaptive hybrid of DE and PSO (A-DEPSO).
In this study, the adaptive hybrid of DE and PSO (A-DEPSO) introduced by Ref. 74 is used to determine the ANFIS model's decision parameters, in which mutation, crossover, refreshing, and selection operators are main operators. The proposed A-DEPSO algorithm is described in the following sections.
Mutation in A-DEPSO. Generally, a mutation operator can promote the efficiency of an optimization method 75,76 . The A-DEPSO algorithm takes advantage of a powerful mutation operator for increasing the local and global searchability. The proposed mutant vector ( XDE l,j ) is generated by the mutant vector of DE (Eq. 27) and the vector created by the PSO algorithm (Eq. 29), which is formulated as, V l,j = w.V l,j + c 1 .rand 1 . xp l,j − x l,j + c 2 .rand 2 . x g − x l,j , where β and δ denote two constant number, G denotes an adaptive parameter for scaling the differential vectors, ρ is a random number in the range of [0, 1], c 1 and c 2 denote two constant numbers in which their values are equal to each other and equal to 1.5. rand 1 and rand 2 denote two random number in the range of [0, 1], w denotes an inertial factor to control the velocities of particles. randn denotes a random number with normal distribution. l and L denote the number of iteration and the maximum number of iterations, correspondingly. xp l,j and x g are the personal best of solution j and best-so-far solution, correspondingly.
Crossover in A-DEPSO. The A-DEPSO uses a new binomial crossover (BC) to boost the population diversity. The BC merges three vectors, comprising the vector X new , x g , and the current solution ( x l,j ) by utilizing an adaptation rate parameter ( A r ) in which creating the crossover vector ( z i ) is done by the following equation:  where A r denote the adaptive rate for the BC, which is expressed as Eq. (34). p a and p b denote two random parameters in the range of [0,1], i rand denote a random integer number in the range of [1, D]. m cr is equal to 0.5 in the first iteration and its value can be changed based on the relationship suggested by Ref. 77 , is defined as, where µ is equal to 0.1. S Ar denotes all successful A r during whole iterations. According to Eq. (35), the A-DEPSO determines the best amount for A r at each iteration and assists it to implement an appropriate search in the solution space.
Refreshing operator in A-DEPSO. Refreshing operator (RO) is added to the A-DEPSO for enhancing the convergence speed. The RO can create the vector ( z i ) based on the solution ( z i ) generated by the BC and two solutions x 1 and x 2 . In fact, two solutions x 1 and x 2 are to promote the exploitation capability in the A-DEPSO. Thus, the RO is formulated as, in which where LC denote a logistic chaotic map 78,79 ( LC = 4.LC.(1 − LC) ) and its initial value is 0.7. The LC is applied to increase the random behavior of A-DEPSO and avoid from local solutions.
Selection operator in A-DEPSO. A-DEPSO uses the selection operator (SO) to determine whether the solution z is better than the current solution ( x l,j ) or not. Based on the SO, the solution in the next iteration ( x l+1,j ) can be formulated as, Performance evaluation. This section introduces seven statistical metrics for assessing the efficiency of five ML models, including Root mean square error (RMSE) 39 , Correlation coefficient (R) 6 , Mean absolute error (MAE) 80 , Relative absolute error (RAE) 80 , Willmott's agreement Index (IA) 81 , Legate and McCabe's Index ( E) 82 , and mean absolute percentage error (MAPE) 83 , which are formulated as, x l,j otherwise.

Study area
In this study, the parameters selected on a monthly basis consisting of discharge and electrical conductivity, which is originated from the Tange-Takab gauging station (Longitude 50° 20′ 02″, Latitude 30° 41′ 09″, and 280 m from mean sea level) and located on the Maroon River of Khuzestan province, Iran. The exact location of the Tange-Takab gauging station is illustrated in Fig. 3. Needless to say, this river, with a drainage area of 6824 km 2 and almost 310 km long, has a profound impact on supplying drinking water, irrigation and recreation for Iranians, in particular southeastern regions' residents in Iran. More specifically, this area, the Maroon basin, witnesses almost the average of 24 °C temperature and 350.04 mm precipitation annually.
Pre-processing and selecting the best combination. The data collected in a span of 36 years (21-March-1980-16-Feb-2016, 432 months) is to simulate EC on a monthly basis through ML models. The given data provided on a monthly time step are segregate into two distinct sections, namely train and test, as 70% (302-month) and 30% (130-month) of whole data is dedicated to training and testing set respectively. As can be observed, Fig. 4 (upper) depicts independent variables, being the time series of Q and Fig. 4 (lower) also illustrates the EC time series being considered a purpose in training and testing periods. Table 1 provides a classification of various statistical criteria such as minimum (MIN), maximum (MAX), average (AVG), range, standard deviation (SD), skewness (S), kurtosis (K), and autocorrelation coefficients (AC) for training, testing, and all data points. From what has been mentioned in the Table, it is clear that the S and K amounts of EC for   The step of choosing the optimum combination of input variables in time series concerning forecasting models by ML models is considered a significant stage, in which the consecutive time series lagged data is influential to a great extent [84][85][86] . There is not any criterion to specify the number of lags; however, the auto-correlation function (ACF), partial auto-correlation function (PACF) and cross correlation (CC) statistical methods are considered to detect the input combination on hydrological models 6,87 .
In Fig. 5 the AF is operated to estimate the effective input parameter. As can be observed, the AC of 1-month and 2-month lagged signals has a more significant influence (more than 55%) on the original input datasets ( Q t ) and EC t in comparison with the lagged times ( Q t−3 ,Q t−4 , . . . , ;EC t−5 ). In addition, based on the PCFA, the 1-month lagged signal for Q t and 1-month to 4-month lagged signals for EC t can be considered.
The Fig. 6 reveals the high correlation belongs to the input single (Q t ) at the current time, the first two timelagged signals ( Q t−1 , Q t−2 ) and first four time-lagged signals of EC t (EC t−1 , EC t−2 , EC t−3 , EC t−4 ), which has more significant effect on creating a predictive model compared to the Q t−3 and EC t−5 . To add to it, comparing  www.nature.com/scientificreports/ the cross-correlation values between target signals ( EC t ) and the input signals proves that the EC t−1 , EC t−2 , with greater correlation coefficients, by 0.68 and 0.45 respectively, play an important part in predicting the WQ parameter of the target. As a result, by analyzing ACF and CC, it is clear that the lagged t of up to 2 and 4 months for the current month predicting of EC t were accepted. Then, in order to determine the best input patterns amongst all available and possible patterns, one of the best subset regression analyses in this research was assessed. Simply put, to choose the optimal input pattern for each WQ target, four distinct criteria, namely of R 2 , adjusted R 2 , Mallows ( C P ) 88 , and Amemiya prediction criterion (PC) 89 are used. In the following C P and PC are defined 90 : In which MSE m expresses mean squared error, i is predictors' number, RSS i is considered the residual sum of squares and N is the historical dataset's number. With regard to Table 2, in which the best subset regression analysis's result EC t is classified, the analysis EC t was evaluated to selected four the most appropriate pattern, the optimum input data for predictive models, based on the best result of factors such as R 2 ([56.90 57.00%]), C P ([4. 74 8.00]) and PC ([0.441 0.444]). It is true to say; this method is unlikely to ensure alone the accuracy of the most suitable input combinations. In turn, taking other statistical conditions, including the Pearson correlation between basic input parameters and the purpose parameters and multicollinearity interaction analysis between inputs, into account is an essential matter to raise the certainty of the combination selection. Admittedly, the current Q and time-lagged EC time series affect considerably the input combination of purpose signal. Hence, in order to predict the current EC t on a monthly basis, four input mixtures were separately provided with the purpose of enhancing ML based on predictive models categorized in the form of boldface in Table 2.

Application results and discussion
In this paper, the A-DEPSO is developed to find optimal parameters of the ANFIS model and to enhance the convergence speed. The efficiency of the proposed method is compared with LSSVM, MARS, and GRNN models to predict the EC parameter in standalone and wavelet-complementary frameworks. In this regard, ANFIS-A-DEPSO can predict the EC, with it overcoming the demerits of the basic ANFIS algorithm by optimizing coefficients detecting membership functions. Thus, it is concomitant with more meticulous predictive outcomes. In fact, the A-DEPSO optimization method is used to extract optimal parameters of the ANFIS model and increase the precision and speed of convergence rate. Furthermore, the position of each member in the A-DEPSO algorithm indicate the amounts of consequent ( α 1 , β 1 , γ 1 , α 2 , β 2 , γ 2 ) and membership parameters ( a k , e k , m k ) in the ANFIS model. The baseline parameter values were treated as the starting locations of the solutions. To validate prediction accuracy, the fitness function of root mean square error (RMSE) was used. The hybrid ANFIS model were run until the RMSE was reduced to a minimum and the methods were converged toward the best solutions. Within every update of the solution' positions, the ideal amounts of design variables were discovered.
Likewise, operating the ANFIS, LSSVM, MARS, and GRNN as striking machine learning methods were useful to confirm the predictive ability of ANFIS-A-DEPSO, which leads to the major novelty of this research. During a trial-and-error manner, the LSSVM, GRNN, and MARS gained their substantial setting parameters. The given Table 3 classifies the amounts of control parameters for these mentioned methods. It should be noted that the population size and the maximum number of iterations for the A-DEPSO algorithm are equal to 50 and 200, respectively. Wavelet-ML models. As another effective model in terms of the certainty of predictive hydrological models can refer to the complementary data-intelligence models, including wavelet discrete or continuous wavelet transforms (DWT and CWT, correspondingly) as well ML model, which stems from a appropriate mother wavelet and decomposition level disintegration. It is commonly observed, two mother wavelet, namely as discrete Meyer (Damey) and Biorthogonal 6.8 (Bior 6.8) have proven their noteworthy ability in WQ predictive models, mainly because they support condensed form and are useful in producing time localization 12,15,20 . In this www.nature.com/scientificreports/ research, the mother wavelet (i.e., bior6.8, and dmey) was used to break up the time series. In the following, the optimal disintegration level (1) of wavelet transform for the WQ time series was formulate 6 : In which N describes the dataset's number, accounting for 432. So, the figure of disintegration level will be 3. As a result, the used basic signals in the EC modeling were divided into three levels of details and approximations as Fig. 7.
In the next step, influential sub-series was collected (e.g.,Q = A 3 + 3 i=1 D i ) and arranged as the input variables for supplementary five ML models based on the input combinations for the EC. Figure 8 demonstrates the details (Ds) and approximations (As) of separated signals of the EC simulation. Figure 9 displays the flowchart of ML models for forecasting EC parameters.
Evaluate the performance of standalone ML models. In this subsection, four various combinations of input parameters given in Tables 4 and 5 evaluated the ability of five standalone ML models in forecasting the EC t for training and testing stages, respectively. Based on the previous studies 3,11,44,91 , the models having the best results in the test period show the best performance, whereby the results of the test period will be examined in order to determine the best model in this study. In fact, the outcomes of the ANFIS-A-DEPSO model, the first standalone one, is addressed. In this regard, Table 5 The given Fig. 10 provides the data on the observed against predicted WQP amounts for training and testing phases. It is clear that the proportion of error in predicted amounts gained through two ML models accounts for ± 40%. Therefore, five standalone ML models are not appropriate to predict the EC. Figure 11 illustrates the distribution of predicted and measured amounts of the EC, which is obtained through the ANFIS-A-DEPSO, ANFIS, LSSVM, MARS and GRNN models for all datasets. More specifically, it is notable (50) nMW = int log (N) .  www.nature.com/scientificreports/ time series of EC is implemented by two mother wavelets (i.e., bior6.8 and dmey). Four mixtures of input variables are used to address the ability of the W-ML models with diverse mother wavelets. The given Table 6 provides data on optimal parameters of the all-wavelet-based models. The comparison of the W-ANFIS-A-DEPSO models' prediction certainty towards two mother wavelets and all combinations are reported on Table 7, which is illustrated the W-ANFIS-A-DEPSO model with mother wavelets Dmey and Bior6.8, as the best combination, is Comb 4(R = 0.990, RMSE = 51.193, MAPE = 2.143, E = 0.979, and PI = 0.480) and Comb 1 (R = 0.988, RMSE = 54.064, MAPE = 13.5676, E = 0.977, and PI = 0.518) correspondingly for EC t prediction in testing phase. The outcomes reveal Dmey has the most appropriate performance comparison  Concerning the W-ANFIS model, it is true to say the best combination is equivalent to all mother wavelets and Comb 4 ( Table 8). In addition, the outcomes of various mother wavelets for the best combination are From what has been gained, it is readily apparent that the best mother wavelet is Dmey for W-ANFIS, thanks to higher accuracy than others.
In the case of W-LSSVM, based on Table 9, the best combination for two mother wavelets is Combo4. The results of Bior6.8 and Dmey mother wavelets for the best combination are: W-LSSVM -Dmey (R = 0.984, RMSE = 64.727, MAPE = 2.638, E = 0.967, and PI = 0.599), and W-LSSVM -Bior6.8 (R = 0.985, RMSE = 62.553, MAPE = 2.564, E = 0.969, and PI = 0.572). Accordingly, the results reveal that the best mother wavelet for the W-LSSVM is Bior6.8, which has a higher precision compared with the Dmey. With regard to Table 11, the W-GRNN outcomes reveal the best combination is Combo 1 for both mother wavelets (i.e., Dmey and Bior6.8). Statistically, the best combination of W-GRNN-Dmey and W-GRNN-Bior6.8 are (R = 0.811, RMSE = 214.160, MAPE = 9.055, E = 0.634, and PI = 0.937) and Comb 1 (R = 0.810, RMSE = 219.231, MAPE = 9.238, E = 0.617, and PI = 0.960, correspondingly. As a result, the best model is considered W-GRNN-Dmey with the PI equal to 0.937. Figure 12 depicts the comparison's results of predicted EC t and observed EC t being carried out by five W-ML models in the best combination of each mother wavelet. As can be observed, W-ANFIS-A-DEPSO-Dmey (Combo 4) outperforms compared to the W-ANFIS-A-DEPSO with Bior6.8 mother wavelet and the four others with all mother wavelets. To add to it, the proportion of errors concerning W-ANFIS-A-DEPSO-Dmey (Combo  Fig. 10, it is clearly observed that the hybrid model W-ANFIS-A-DEPSO can improve the correlation coefficient (R = 0.988) up to 52% compared to the standalone ANFIS-A-DEPSO (R = 0.672) during the test period.
The spider plot based on seven factors for the top four models along with the best combination of input variables is displayed in Fig. 13. In fact, according to the mentioned diagram, the more the values of "R, I A and E L,M " and "RMSE, MAE, RAE, and MAPE" obtained by each model become closer to the value 1 and to the center of the diagram, respectively, the more the model is reliable. According to Fig. 13 (lower panel), the most effective model to rise the accuracy of forecasting the EC t is W-ANFIS-A-DEPSO-Demy with the largest R, I A , and E L,M , and smallest RMSE, RAE, and MAPE for both training and testing stages. On the other hand, Fig. 13   The correlation coefficient (R) was figured based on the Taylor diagram to assess the overall ability of the models, with it providing the models' efficiency in detail 5,11 . According to the R and standard deviation, the diagram demonstrated a more perceptible and persuasive connection between predicted and observed WQ parameters. The Taylor diagram illustrated in Fig. 14 associated the current monthly EC with Bior6.8 mother wavelet (Upper panel) and EC with Dmey mother wavelet (Lower panel) for all ML models. As a result, W-ANFIS-A-DEPSO has the most suitable performance for EC prediction compared with the other models and is the closest model to the target point.
Compare the performance of all ML models. The contrastive analysis is provided in this section to determine the best model. Consequently, five ML models along with two mother wavelets and four combinations of input variables are under review to forecast the EC in this research. As mentioned before, the W-ANFIS-A-DEPSO-C4 (Dmey), W-ANFIS-C4 (Dmey), W-LSSVM-C4 (Bior6.8), W-MARS-C1 (Dmey), and W-GRNN-C1 (Dmey) have the better performance amongst all models for EC t prediction. Figure 15 demonstrates the physical trend of five methods to further address their abilities, which results in the disability of standalone ML models in prediction for the EC t . Since there are high variations and the characteristic non-linear correlation between the WQ parameters, making a steady model through ANFIS-A-DEPSO, ANFIS, LSSVM, MARS, and GRNN is sophisticated matter. In turn, the aim is to enhance five meticulous ML models concerning wavelet theorem and assess the impact of wavelet transform joined with ML models for EC prediction. According to Fig. 15 W-MLs have the edge over standalone ML models without wavelets in terms of efficiency.  To implement a fair comparison, the population size and the maximum number of iterations (MaxIt) are equal to 50 and 300 for all hybrid models respectively, except for ANFIS-A-DEPSO, which is equal to 50. In fact, by choosing the value of 50 for MaxIt, we try to show that the proposed model can provide better performance than other models in a much smaller number of iterations. Table 12 reports the parameter settings of all methods. According to the selected hybrid models, the PSO, GWO, and WOA do not have any parameter settings. For instance, PSO uses a weighted factor ( w ), decreasing with a linear relationship, to damp its velocity, so it does not need to be set. It should be noted that the values of parameters c1 and c2 are constant. In Refs. 74,92 their values are recommended equally to 1.5 for both of them. This is true of the other two methods as well (i.e., W-ANFIS-GWO and W-ANFIS-WOA). In this section, all hybrid models were applied to predict the EC parameter using the Combo 4-Dmey as the input, because this combination has a better performance for the ANFIS model based on previous sections. www.nature.com/scientificreports/ Figure 17 displays the convergence graphs of all hybrid models to predict the EC parameter. From the figure, it can be clearly seen that the proposed model can converge to a lower value (83.955) compared with the other hybrid models. Also, the proposed model can achieve a better value of RMSE at less than 10 iterations, while the other hybrid models cannot even converge to a suitable solution after 50 iterations. This confirms the proposed model's superiority compared to the other hybrid models again. Tables 13 and 14 give the statistical outcomes of the WANFIS-A-DEPSO and three other hybrid models to predict the EC parameter for both training and testing stages. According to these tables, the proposed W-ANFIS-A-DEPSO can provide better results in terms of RMSE (train: 83

Conclusion
By designing a promising model called wavelet-ANFIS-A-DEPSO with two mother wavelets (Dmey and bior6.8), EC t prediction can be made on a monthly basis in surface water. In fact, a powerful optimization method, A-DEPSO, was developed to increase the ability of ANFIS models. The A-DEPSO is a hybrid of DE and PSO with a boost of exploration and exploitation and two adaptive parameters. In addition, a novel crossover with www.nature.com/scientificreports/ adaptive parameters was used to increase the diversity of the population. Moreover, a refreshing operator was implemented to raise the chance of escaping from local solutions. Four ML models (i.e., ANFIS, LSSVM, MARS, and GRNN) were operated to forecast the EC with the purpose of evaluating the proposed model's efficiency. Besides, standalone ML models were utilized to assess the predictive ability of all W-ML models for the EC water quality parameter via some metrics and validation manner. Consequently, the monthly time series of Q and EC were operated during 36 years in the Maroon river within two and four time-lagged correspondingly. Indeed, these two and four time-lagged were detected by statistical procedures, and three decomposing levels were used for each mother wavelets. More specifically, the best subset regression analysis was considered to detect the best input combination of EC prediction. More importantly, W-ML models improved the certainty of EC modelling. The Dmey, jointed with ANFIS-A-DEPSO, ANFIS, MARS, and GRNN models to predict EC, proved the noticeable and best advancement in terms of the accuracy level of simulation, albeit Bior 6.8 showed appropriate performance. On the other hand, when Bior6.8 joined with LSSVM brought about a more suitable performance compared to Dmey.  , it can be clearly determined that the proposed model has a better efficiency than the W-ANFIS model. Moreover, according to the graphical analysis (i.e., scatter plots, time series plots, Taylor diagram, and violon graph), it is evident that the proposed model can predict the EC parameter more accurate and reliable than the other models.
Furthermore, the suggested model was compared to three hybrid models (W-ANFIS-PSO, W-ANFIS-GWO, and W-ANFIS-WOA) to evaluate its effectiveness. The findings show that the suggested model is more accurate in terms of RMSE (train: 83.955, test: 51.193) and MAPE (train: 3.607, test: 2.1427) than the other models.
To sum up, from what had been addressed in all ML-based models, it is obvious that the W-ANFIS-A-DEPSO, a supplementary model, is able to predict the EC accurately. As a suggestion, firstly, it would be operated as an ensemble multi-wavelet model in order to use wavelets simultaneously. Secondly, designing an ensemble ANFISbased method could have a positive impact on WQPs prediction in surface water, which may lead to accumulating the merits of each supplementary procedure. Finally, it can be applied to other optimization methods to optimize the main parameters of ANFIS model [95][96][97] .     Relative deviation to forecast the EC using ML models coupled with Bior6. 8 (upper) and Dmey (lower) mother wavelets.    Table 14. Compare W-ANFIS-A-DEPSO with three hybrid models in testing stage.

Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.